# Multimodal interaction
Llama 4 Scout 17B 16E Instruct INT4
Other
The Llama 4 series is a native multimodal AI model launched by Meta. It adopts the Mixture of Experts architecture, supports text and image interaction, and performs excellently in various language and visual tasks.
Multimodal Fusion
Transformers Supports Multiple Languages

L
fahadh4ilyas
1,864
0
Llama 4 Scout 17B 16E Instruct FP8
Other
The Llama 4 series is a native multimodal AI model launched by Meta, supporting text and image interaction. It adopts the Mixture of Experts architecture and performs excellently in text and image understanding.
Multimodal Fusion
Transformers Supports Multiple Languages

L
fahadh4ilyas
1,760
0
Qwen.qwen2 VL 2B GGUF
Qwen2-VL-2B is a multimodal model that can handle image and text inputs and generate text outputs.
Image-to-Text
Q
DevQuasar
127
0
Videochatonline 4B
MIT
VideoChat-Online is an online video understanding model based on Phi-3-vision-128k-instruct, focusing on the video text-to-text task.
Video-to-Text
V
MCG-NJU
61
0
Uground V1 7B
Apache-2.0
UGround is a powerful GUI visual positioning model trained with a simple recipe, developed in collaboration by OSU NLP Group and Orby AI.
Image-to-Text
Transformers English

U
osunlp
2,053
12
Pae Llava 7b
PAE-LLaVa-7B is a foundation model Internet intelligent agent based on the PAE (Proposer-Agent-Evaluator) framework, focusing on autonomous skill discovery.
Text-to-Image
Safetensors
P
yifeizhou
186
1
Command132
MIT
An Any-to-Any subnet model developed in collaboration by OMEGA Labs and Bittensor, supporting multiple task conversions
Large Language Model Other
C
mrbeanlas
0
0
Mini Omni2
MIT
Mini-Omni2 is a fully interactive multimodal model capable of understanding image, audio, and text inputs, and engaging in end-to-end voice conversations with users.
Multimodal Fusion
M
gpt-omni
192
269
Mixtral AI Vision 128k 7b
MIT
A multimodal model that combines visual and language abilities, achieving image-text interaction through a merging method
Image-to-Text
Transformers English

M
LeroyDyer
384
4
Featured Recommended AI Models